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Summary: A novel point process model continuous in space-time is proposed for quantifying the transmission dynamics of 
the two most common meningococcal antigenic sequence types observed in Germany 2002-2008. Modelling is based on the 
conditional intensity function (GIF) which is described by a superposition of additive and multiplicative components. As an 
epidemiological interesting finding, spread behaviour was shown to depend on type in addition to age: basic reproduction 
numbers were 0.25 (95% GI 0.19-0.34) and 0.11 (95% GI 0.07-0.17) for types B:P1.7-2,4:Fl-5 and G:P1.5,2:F3-3, respectively. 
Altogether, the proposed methodology represents a comprehensive and universal regression framework for the modelling, 
simulation and inference of self-exciting spatio-temporal point processes based on the GIF. Usability of the modelling in 
biometric practice is promoted by an implementation in the R package surveillance. 
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1. Introduction 

Infectious diseases - such as influenza, gastroenteritis, and the “swine flu” among humans, or foot 
and mouth disease, the “bird flu”, and classical swine fever among animals - are a matter of tremen¬ 
dous public concern especially gaining attention in case of outbreaks. The present work concentrates 
on stochastic modelling and associated inference for spatio-temporal epidemic point referenced data 
motivated by the analysis of routinely collected invasive meningococcal disease (IMD) data. IMD is 
a life-threatening human bacterial disease mostly manifesting as meningitis or sepsis. Its pathogenic 
agent, Neisseria meningitidis (aka meningococcus), can be transmitted by large droplet secretions 
from the respiratory tract of colonized or infected humans. The only reservoir of meningococci 
is the human (mostly nasopharyngeal) mucosa (Rosenstein et al., 2001). Data on cases of IMD 
related to the two most common meningococcal hnetypes B:P1.7-2,4:Fl-5 and C:P1.5,2:F3-3 in 
Germany 2002-2008 are obtained from the German Reference Gentre for Meningococci (Nationales 
Referenzzentrum fiir Meningokokken, NRZM). Here, a ’hnetype’ represents a unique combination 
of serogroup, sequence type of variable region 1 and 2 of the outer membrane protein PorA, and 
sequence type of the variable region of the outer membrane protein FetA. One specihc question of 
interest for the researchers at the NRZM is whether the two hnetypes (in what follows abbreviated 
B and G) exhibit different spatio-temporal behaviour. 

The postal code of the patient’s home address was the spatial resolution available for our analysis. 
Despite being spatially discrete we consider centroids of postal code areas as quasi-continuous in 
space when looking at entire Germany. As usual with infectious diseases, the actual time point of 
infection is unknown for the IMD cases. Therefore, we dehne the beginning of illness and infectivity 
as the date of specimen sampling. 
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(a) Finetype B:P1.7-2,4:Fl-5. 


(b) Finetype C:P1.5,2:F3-3. 


Figure 1. Monthly numbers of IMD cases for both finetypes separately. 


All in all, n = 636 infections with finetypes B (336) and C (300) have been registered. Figure 
shows the monthly numbers of IMD cases for each finetype. Cases of IMD predominantly occur 
during winter and early spring, which can be seen from more or less pronounced peaks in the 
figure. Specifically, a connection between outbreaks of meningococci and influenza is hypothesized. 
For example, Jensen et ah (2004) found an association between the influenza detection rate and the 
number of IMD cases during the same week in temporal analysis of data from Northern Jutland 
County in Denmark, during 1980-1999. 

Figure presents the spatial distributions of the two finetypes based on the postcodes of the 
patients’ residences. Over the 7-year period some cases shared the same postal code, therefore, the 
area of each point in the figure is drawn proportional to the number of cases at its location. For the 
serogroup B finetype in |(a) the highest point multiplicity is 16, whereas for the serogroup C finetype 
(b) this number is 4. In connection with the temporal occurrence of the events shown in Figure 


m 


the spatial distribution suggests that IMD is an endemic disease, i.e. cases can occur at any time 
and at any location. The maps also show the population densities of the districts, which can be 
assumed to be roughly proportional to the population at risk of infection. Spatial heterogeneity of 
the observed point patterns thus partially arises from spatial variation in the population density. 
Not surprisingly, the intensity of points in metropolitan areas like Berlin, Munich or the Ruhr 
is higher. Animated graphics of the space-time locations of infections give more insight into the 
epidemic character of the finetypes, and can be found as Web Animation 1. Here, it appears as if 
finetype B exhibits a more stationary pattern than finetype C - in the sense that infections cluster 
more in space and time. It is supposed, yet not proven, that this phenomenon is due to differences 
in the mucosal immune reaction elicited; specifically, finetype B might be more successful than C 
in evading mucosal clearance. 

Quantifying the dynamics of IMD would be an important step in the finetype characterisation 
of IMD. We want to perform such an investigation in a spatio-temporal manner and therefore 
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(a) Finetype B:P1.7-2,4:Fl-5. 


(b) Finetype C:P1.5,2:F3-3. 


Figure 2. Spatial point patterns of the cases of meningococci by finetype dnring the years 2002- 
2008. The area of each dot is proportional to the nnmber of cases at its location. Also shown are 


the popnlation densities (inhabitants per km^) of Germany’s districts (sonrce: Federal Statistical 


Office (DESTATIS) (2009)). 


nse spatio-temporal point processes as modelling framework. Specihcally, we want to establish a 
regression framework allowing us to quantify the transmission dynamics of IMD and its dependency 
on covariates. Point process modelling has in the context of epidemics been used in a discrete spatial 
setting in, e.g., Neal and Roberts (2004), Diggle (2006), Scheel et ah (2007) and Jewell et al. (2009). 
Spatio-temporal epidemic modelling in an explicit continuous spatial setting, however, is rare with 


Diggle et al. (2005) being one of the few examples of covariate adjusted modelling. One explanation 


is the balancing between optimal spatial resolution of the data and conhdentiality of cases. 

Recently, there have been suggestions for splitting the dynamics of infectious diseases into endemic 
and epidemic components; see Held et al. (2005) for a discrete spatial - discrete time perspective 


and Hohle (2009) for a discrete spatial - continuous time perspective. For the continuous spatial 
- continuous time setting, similar modelling approaches have been seen in the analysis of earth¬ 
quake data, see e.g. Ogata (1998, 1999). Other areas of application are the modelling of forest 


hres (Peng et al., 2005), residential burglaries (Mohler et al., 2010), and the analysis of bird nesting 


patterns (Diggle et al, 


2009). Altogether, our proposed modelling provides a unifying regression 
for the modelling, inference and simulation of spatio-temporal 


framework - beyond epidemics 
point processes. 

This article is organized as follows: Section presents the spatio-temporal two-component epi¬ 
demic model based on the GIF, whereas Sections and discuss inference and simulation for the 
proposed model. Section [^analyses the IMD data, and a discussion in Section]^ dualizes the article. 
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2. Spatio-Temporal Two-Component GIF Model 

In the following text, we propose a novel additive-multiplicative model for the conditional intensity 
function of an infectious disease process continuous in space-time with events occuring in a prespec- 
ihed observation period [0,T], T > 0, and observation region W C The GIF A*(t, s) represents 
the instantaneous rate or hazard for events at time t and location s given all the observations up 
to time t (the asterisk notation shall represent the conditioning on the random past history of the 
process). 

The basic framework of the proposed model is to superimpose endemic and epidemic components 
in order to model the IMD surveillance data - an idea similar to the two-component spatial SIR 
model ( Hohlef 2009): 


X*{t, s) = h(t, s) + e*(t, s) (t > 0, s e W) 


The epidemic component e*(f, s) represents the spread of the disease by person-to-person contact. 
The endemic component s) models otherwise imported cases and is - contrary to the epidemic 
component - independent of the internal history of the process. 


2.1 Specification of the Endemic Component h(t,s) 

The endemic component is of the multiplicative form h{t, s) = p{t, s) exp{^'z{t, s)), where p{t, s) 
is a known spatio-temporal intensity offset, e.g. the population density at time t in the district 
containing the location s, such that the endemic rate of infection is proportional to the population 
density. Furthermore, z{t, s) is a linear predictor of endemic covariates, e.g., this could be a temporal 
trend or exogenous covariates resulting from another jointly evolving point process. For example, 
in the IMD application, an endemic covariate is the number of influenza cases on a week x district 
grid (possibly time-lagged). Altogether, the endemic component is modelled as a piecewise constant 
function on some spatio-temporal grid resulting from a decomposition of the time period (0, T] and 
the observation region W. The consecutive time intervals of this decomposition (e.g. weeks) are 
denoted by Ci, ..., Cd C (0, T], and the spatial tiles (e.g. districts) are denoted by Ai,..., Am C W . 
Let the functions r(f) and f{s) return the indices of the temporal and spatial grid units containing 
time point t and coordinate s, respectively. Then, the endemic component can be written as 

h{t, s) = pr{t),i{s) exp Zr{t),^{s)) , ( 1 ) 

where PT(t),^(s) is the known interval- and tile-specific offset and : r G {!,...,D},.^ G 

{!,..., M}} is a collection of covariates on the spatio-temporal grid {Ci,..., C^} x {Ai,..., Am}- 

2.2 Specification of the Epidemic Component e*{t,s) 

The self-exciting component of the model essentially provides a description of the infection pressure 
at a space-time location {t, s) caused by each infectious individual. This infectivity of an infectious 
individual j, denoted by ej{t, s), corresponds to the inhomogeneous rate of a Poisson process, the 
realisations of which are the space-time locations of infected individuals. This so called triggering 
function is factorized into separate effects of marks, elapsed time, and relative location: 

ej{t,s) = e^^ g{t-tj) f{s - Sj) , {t > tfi (2) 

where {tj, Sj) is the infection time and location of individual j, = 7 o + 'y'rrij is a linear predictor 
based on the vector of unpredictable marks rrij attached to the infected individual, and g and 
/ are positive temporal and spatial interaction functions, respectively. The effects 7 of marks 
reflect that different individuals might cause more or less secondary cases, depending on individual 
characteristics. 
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The interaction functions describe the decay of infectivity with an increasing spatial or tem¬ 
poral distance from the infection source. In infectious disease applications, / is often taken to 
be a radially symmetric kernel corresponding to an isotropic spread of the disease, such that 
f{s — Sj) = /(||s — Sj II). A typical example is to let / be the kernel of a bivariate normal density 
with zero mean and diagonal covariance matrix. The temporal interaction function could be chosen 
as g{t) = f > 0, a > 0, representing an exponential temporal decay of infectivity 

T^. 

The resulting epidemic component is the sum of the contributions (|^ of all infectious 

individuals at time t and location s. Formally, 

e*(f, s) = f l(o.e](^ - i) l[ 0 , 6 ](||s - s||) g{t - i) f{s - s) N{di x ds x dm) , 

J{0,t)xWxM 

= Sj) , (3) 

j£l*(t,s) 

where A4 is the mark space, N is the time-space-mark point process counting the infections and 
I*{t,s) := G {1,... , Ng{t—)} : —tj) = 1 A l[o, 5 ](||s — Sj||) = l| is the history-dependent 

set of infectives at time t and location s, where Ng{t—) = N{{0,t) x W x Ai). In the above, 
the hyperparameters £, 5 > 0 are introduced as known maximum temporal and spatial interaction 
ranges. A past event only influences the process at time t and location s, if both indicator functions 
are true, i.e. if it occurred at most £ time units ago at a location within distance 6. 

2.3 Characteristics of the Model 

Altogether, the proposed GIF model for a self-exciting spatio-temporal point process with compo¬ 
nents 0 and ([^ is 

\*{t, s) = pr{tU{s) exp [p' Zrp),^^s)) + dif “ tj) f {s - Sj) , 

j£l*it,s) 

which we shall call twinstim to indicate a fwo-component spatio-temporal (conditional) intensity 
model. For the proposed model an interesting quantity is the individual-specihc mean number pj 
of infections caused by individual j inside its spatio-temporal range of interaction: 

POO P 

lj,j= / ej(t, s)l(o,£](t-tj)l[o,5](||s-Sj||)dfds 

J 0 IR,^ 

= ■ / 9{t) ■ / /(s) ds . (4) 

JO Jfe(0,5) 

Here, b{0,6) denotes the disc centred at (0,0)’ with radius 6. The integration domain 1R+ x IR^ 
above stems from the theoretical point of view that the point process occurs in unlimited time 
and space. In practice this is not observable, but individuals near the border would be attributed 
a truncated value of pj if integrating over W - or, similarly, [0,T] - only. Such edge effects are 
overcome by (|^, which also simplihes interpretation by providing a quantity similar to the basic 
reproduction number Rq known from classical epidemic modelling. Specihcally, the number pj offers 
an intuitive way of interpreting the parameters 7 in the linear predictor rjj, because they can be 
handled as usual in Poisson regression models: a unit positive change in a specihc continuous mark 
mji multiplies the mean number of infections by the corresponding parameter e'*'*. 

2.4 Extension: Type-Specific twinstim 

Although the model of the previous subsection allows for a hnetype-specihc infectivity through the 
vector of unpredictable marks m^, it is not applicable for a joint modelling of both hnetypes. This 


(Hawkes 
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is because finetypes do not change during transmission. Therefore, the point process model will be 
extended to a marked version suitable for the specihc application of IMD and point patterns with 
different event types in general. 

Denote by/C = {!,...,iC} C N the set of possible event types. Dehne an indicator matrix 
Q = {(lk,i)k,i£ic, where qk,i € {0; 1}, which determines the possible ways of transmission. If qk^i 
equals 1, an infective type k event can cause an event of type 1. For instance, the IMD data would 
require Q = I 2 , because the transmission is hnetype-specihc. A marked spatio-temporal point 
process on (0,T] x IF x /C is then dehned by the following model for the GIF; 


\*{t, s, k) 

= h{t, s, k) + e*{t, s, k) 

hit, s, k) 

= Pr(i),$(s) exp (^/3o{k) + Zr(t),i(s)) 

e*{t, s, k) 




ej{t,s) 

= GWiVj) ■ git - tMj) ■ 

r{t,s,K) 

= {j e{l,...,Ng{t-)}: 


( 5 ) 


1 A l[ 0 , 5 ](||s - SjW) = 1 A = 1} . 


Here, the transmission indicators from the matrix Q have been integrated into I*{t,s,K). Note 
that the event type kj is now part of the vector rrij, which enables type-specihc epidemic in¬ 
tercepts as well as type interactions with individual covariates in the linear predictor rjj. The 
new endemic intercept /9o(^) either represents a type-specihc endemic intercept, i.e. /So(^) = 
J2k=i (^o,k^{k=K}{K-) = Po,K, or contains only a single global intercept (3o{k) = (3o, corresponding 
to the hypothesis /So = /So,i = ■ ■ ■ = /So.ic- For the remainder of the endemic predictor, the model 
assumes independence of k, which means that the effect of endemic covariates is homogeneous over 
the event types. However, the history-dependent set I*{t, s, k) of infective individuals now accounts 
for the transmission regime Q between the event types, and the interaction functions are allowed 
to depend on the type of the infective event as well. 


3. Statistical Inference 

This section deals with likelihood inference for the parameters of the GIF in (|^ based on the 
observed marked spatio-temporal point pattern x = {(tj, Sj, mj) :/ = !,... ,n}, where the event 
type Ki is part of the vector of marks rrii, and n is the number of events, i.e. a realisation of Ng{T). 
The parameter vector in question is 6 = (/Sq,/S', 7 ', cr', ck')', where cr and cx. are the parameter 
vectors of the spatial and temporal interaction functions and Qa, respectively. 

In our framework, no attempt is made to model unpredictable marks like gender and age but 
they are taken as given predictor variables in models of the GIF. In this case, the log-likelihood 


of the underlying point process on [0; T] x IF x Ad may be conveniently written as (Daley and 


Vere-Jones, 2003) 


I] log XliU, Si, 


Ki 


i=l 


/O JW 


Ag(t, s, k) dt ds . 


kGK 


The components of the above sum can be directly calculated for a specihc value of the parameter 
vector 6 after having determined the set Si, Ki) of potential sources of infection for the ith 
event. Furthermore, the integrations of the endemic and epidemic components of the GIF can be 
performed separately due to their additive superposition. Recalling that the endemic component is 
a piecewise constant function on the spatio-temporal grid {Ci ,..., Co} x {Ai,..., Am}, its integral 
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/o Jw 


( 6 ) 


is in fact a sum over this grid of smallest observed units in space-time: 

pT p / \ D M 

^ s,fi:)dtds = ( ^ exp (/3 o(k)) ) ■ exp (^' 2 ^, 5 ) . 

^ k&K. ^ t =15=1 

The integrated epidemic component can be simplified by moving the indicators of the function 
J*(t, s, k) back into the sum: 

rT r 

e*Q{t, s, k) dt ds 


Jo JW 




pT P n 

/ / ^ 1(0,e](^ ~ tj) l[o,5](||s — Sjll) qKj,K 6^'’ Qait ~ fcr{^ ~ ^j\^j) dt ds 

JO JW _ ■ 1 


kGK j=l 


0^3 


fmin{T—tj-,£} 


gait\Kj) dt 


' Rj 


fcT{s\Kj) ds 


(7) 


i=i 

Here, := is the number of different event types that can be triggered by an event 

of type Kj, and Rj := |hT fl b{sj] 5)| — Sj is the spatial interaction region of the jth event centred 
at its location. 

The evaluation of the two-dimensional integral over the domains Rj is the most sophisticated task 
of the log-likelihood evaluation. Meyer (2009) compared accuracy and speed of different cubature 
rules for performing the numerical integration. Here, the two-dimensional midpoint rule (see e.g. 


Stroud, 1971) proved to be best suited for the task. In contrast, the evaluation of the definite 


integral over the temporal interaction function is analytically accessible for typical choices of 
Altogether, an analytical maximisation of the above log-likelihood is not feasible, and a numerical 
optimisation routine such as BFGS (see e.g. Nocedal and Wright, 1999, Section 8.1) is required. 
Here, it is advantageous to know the score function s{6), which is derived in Web Appendix A. 
Uncertainty of the parameter estimates is deduced from the expected Fisher information X(0) as 
estimated by the “optional variation process” adapted to the marked spatio-temporal setting - see 
Web Appendix B for details. Significance of specific model parameters can be investigated by Wald 
or likelihood ratio tests and model selection is performed based on Akaike’s information criterion 
(AIC). 


4. Simulation Algorithm 

In general, the usability of a model class is greatly improved by the ability to simulate from a specific 
model. For instance, it enables model checking and parametric bootstrap. For evolutionary point 


processes specified by their GIF, Ogata’s modified thinning algorithm (Daley and Vere-Jones, 2003 


Algorithm 7.5.V.) provides a convenient and exact way to simulate realisations of the process. The 
algorithm requires piecewise upper bounds for the intensity A*(t) of the ground process Ng{t) : = 
A((0,f] X lU X /C). This intensity is determined as 

K,eK \k&k / \5=i j 

Nad-) / \ ^ 

+ E E 1(0,£](^ - tj) 9 {t - tj\Kj) / fis\Kj) ds 

7=1 VkS/C / 
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This function is bounded above by the GIF A*(t), which is defined by replacing g{t\hi) by the 
constant temporal interaction function g(t\K) = ma,xgtuln). This GIF is piecewise constant in time 

u>0 

as it only jumps at time points where any of the endemic covariates in in any tile ^ changes 

its value, or when the set of currently infectious individuals changes, i.e. whenever a new event 
occurs or a previous event stops triggering. 

Given a parameter vector 0, the ranges of interaction £ and 5, as well as a sampling scheme for 
the marks m^, the time point of the next infection starting from the current time t = to can be 
generated as follows: Draw an exponentially distributed random variate A with rate A*(to)- The 
simulated value of A is a proposal for the waiting time to the next event, i.e. the next time point 
of infection might he i = to + A. However, this proposal is not valid if the rate A*(t) had changed 
between to and t. In this case, time is set to the hrst changepoint after to and a new A is simulated. 
Eventually, a proposed time point i is valid. It is then accepted with probability A*(f)/A*(f). If it 
is rejected, time is set to t = i and a new waiting time A is simulated as above. If it is accepted, 
location s and type k of the event have to be simulated. At hrst, the source of infection is sampled 
with probabilities proportional to the respective components of A*(f): 


P (endemic source) ■ A*(f) 
P(source = event j) ■ X*gii) 



for j G {!,... ,Ng{t—)}. On the one hand, if the new event has an endemic source, then P(/i; = 
k) oc exp(/3o(fc)), k & )C, and P(s G A^) oc |A^| Pr{i),^ ^ = 1,..., M. In the sampled tile 

the location s is uniformly distributed. On the other hand, if the new event was triggered by the 
previous event j, then k ~ U{{k : = I}), and s = Sj + v, where v is drawn from the density 

f{s\Kj)/ f{s\hij) ds on e.g. using rejection sampling. 

A scheme of the described algorithm can be found as Web Appendix G. 


5. Application to the IMD Data 

Although visual comparisons between the hnetypes and heuristic comparisons of the estimates of 
separate hnetype-specihc models are possible, this does not allow to assess potential differences 
statistically. We thus conduct a joint analysis of the two hnetypes by the marked twinstim of 
Section |2.4[ We perform model selection for the joint point pattern of 630 cases of IMD with 


complete age and gender information by using AIG to compare all models with the GIF composed 
by subsets of the following terms: 

• Endemic component: common or hnetype-specihc intercept, linear time trend, time-of-year ehects 
(one or two harmonics), and linear ehect of weekly number of inhuenza cases registered in the 
district of a point (no time lag, lags 0 and 1, lags 0-2, or lags 0-3) taken from the SurvStat 


database (Robert Koch-Institut, 2009). 


• Epidemic component: gender, age (categorized as 0-2, 3-18 and ^ 19 years), hnetype, and age- 
hnetype interaction. 

As an ohset in the endemic component, we use the district-specihc population density (in¬ 
habitants per km^). A hxed hyperparameter of e = 30 days is assumed - this maximal temporal 


interaction range is consistent with the range used in, e.g., Zangwill et ah (1997). Because the 
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Table 1 

Parameter estimates for the endemic (top) and epidemic (bottom) component of the model with the lowest AIC 

(AIC=18968). The p-values correspond to Wald tests. 



Estimate 

Std. Error 

2 value 

P(|Z| > |2|) 

/3o 

-20.3652 

0.0872 

-233.53 

< 2 • 10“^® 

/3trend 

-0.0493 

0.0223 

-2.21 

0.027 

/?sin 

0.2618 

0.0649 

4.03 

5.5 • 10“°® 

Pcos 

0.2668 

0.0644 

4.14 

3.4 • 10“°® 

7o 

-12.5746 

0.3128 

-40.21 

<0 

1 

0 

CM 

V 

73-18 

0.6463 

0.3195 

2.02 

0.04310 

7^19 

-0.1868 

0.4321 

-0.43 

0.66558 

7c 

-0.8496 

0.2574 

-3.30 

0.00097 

logo- 

2.8287 

0.0819 




number of supposedly direct transmissions in the IMD dataset is humble, we will furthermore 
assume a constant temporal interaction function g (i.e. constant spread within the £ days) in order 
to not overparametrize the epidemic component. The spatial hyperparameter is hxed at 5 = 200 
km ~ this parameter needs only to be large enough not to influence the estimation of the actual 
spatial interaction function /. 

To restrict the model search, and hence computing time, we hrst performed the search for all 600 
models (2 ■ 2 ■ 3 ■ 5 conhgurations of the endemic component and 2 ■ 5 conhgurations of the epidemic 
component) with constant spatial interaction function /. Hereafter, the top 10 models of this search 
were investigated further with two Gaussian spatial interaction functions: one with joint variance 
parameter and one with hnetype-specihc variance parameter. 

The GIF of the resulting AIG-best model obtained by this search was Xg{t, s, k) = 

■ exp + Arend^ + Ain siu ( [t\ + (3cos COS ( [fj ^ 

+ exp (70 + 73-181 [3,18](age^) + 7^i9l[i9,oo)(agej) + 7cl{c}(«i)) A(s - Sj)- 


Here, {t,s,K) denotes days since 31 December 2001, coordinate in ETRS89 (kilometre scale) and 
hnetype. With [tj we denote monday of week r(f), i.e. the lower bound of time intervals Ci ,..., Co¬ 
in the linear predictor of the epidemic component, age group 0-2 and type B serve as reference 
categories. The corresponding parameter estimates of the best model, now htted to the 635 cases 
with available age, are found in Table [Tj 

Thus, there appears to be no noteworthy difference in the endemic behaviour of the two types: a 
linear downward time trend superimposed with one harmonic best describes the endemic behaviour 
of the point pattern (see Figure [3(a)| . An additional effect of past numbers of influenza cases does 
not improve the model. In contrast, there is an effect of past IMD cases, i.e. the process is indeed 
self-exciting. Gomparing the endemic-only model with the model enriched by an epidemic intercept 
only, greatly improves the £t (AAIG=202.84). In the epidemic component, there is a detectable 
dependence on marks with type G being less aggressive than type B. Figure 3(b) shows the resulting 
hnetype-specihc spatial interaction functions which for type G is T00% = 43% of type B. Finally, 
there is a signihcant age diherence in the infectivity of cases: the highest potential is found in the 
3-18 year old, which could be interpreted as the kindergarten and school-aged children having a 
higher contact behaviour than e.g. adults. 

Based on the selected model, basic reproduction numbers of /ib = 0.25 (95% GI 0.19-0.34) vs. 
yUc = 0.11 (95% GI 0.07-0.17) are obtained by calculating the type-specihc expectation of (|^ over 
the empirical distribution function of the additional covariates in the epidemic predictor (here: age 
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(a) (b) 

Figure 3. (a) Trend and seasonal component of the fitted model; one observes the typical IMD 

peak in late February and minimum in August. Furthermore, (b) shows the spatial interaction 
function multiplied by the type modiher illustrating the higher epidemic potential of type B. 


group). The conhdence intervals are given as the 0.025 and 0.975 quantiles of samples obtained by re¬ 
computing fx-Q and /ic for 999 additional coefficient vectors drawn from the asymptotic multivariate 
normal distribution of the parameter estimates in Table The conhdence intervals thus indicate 
a higher epidemic potential of the serogroup B hnetype. Note that these numbers are lower than 
what one would expect from the literature, e.g. Trotter et ah (2005) report an Rq estimate of 1.36 
for serogroup C. Two explanations account for this discrepancy: hrstly, our estimation is based on 
transmission between cases with invasive disease and not between asymptomatic carriers, who are 
not represented in disease surveillance data. Secondly, use of an endemic component means that 
our Rq estimates are destined to be lower, because sporadic cases do not contribute to the number 
of secondary cases. Still, our estimates provide realistic lower bounds for carriage reproduction 
numbers. 


ected spatio-te mporal point process model, we follow 
|l996) by computing 17 = K*(ti) - A*(fi_i), 


Rathbun 


To inspect the goodness-of-ht of the se 
the suggestion by Ogata (1988) (see also 
i = 2,... ,n, where A*(f) is the htted cumulative intensity function of the ground process. If the 

estimated GIF describes the true GIF well, then Ui = l-exp(-17) ~ [7(0,1). Figurej^a) contains a 
plot of the cumulative density function (GDF) of the observed Ui and for comparison the GDF of the 
U (0, l)-distribution together with error bounds computed by inverting the one sample Kolmogorov- 
Smirnov test. The £t appears good, but noticable deviations for Ui < 0.15 can be observed, which 
we suspect to occur due to the tie-breaking strategy of subtracting e = 0.01 days from ties. As 
observations are on a per-day basis and thus are interval censored we re-estimated the model for 
a data set where ties were broken by subtracting a 77(0, l)-distributed random number from each 
observation time. Figure |^b) shows the improved £t of this analysis - the relative changes in the 
parameter estimates are minor. 







































(a) e-scheme. (b) [7(0, l)-scheme. 

Figure 4. CDF of the observed Ui together with 95% Kolmogorov-Smirnov error bounds for data 
with tie breaking according to the (a) e scheme and (b) U{0, 1) scheme. 

Another way of assessing the goodness-of-£t is by simulation from the fitted GIF. Figure shows 
the observed 7-year incidences (per 100,000 inhabitants) of the 413 districts for both hnetypes 
together. In order to identify extreme observations that are not explained by the selected model, 
we simulated 100 realisations of the process and determined the 2.5% and 97.5% quantiles of 
the district-specihc 7-year incidences. In the hgure, districts with observed incidences outside the 
simulated 95%-range are marked by triangles. Many of the 17 districts with an excess are found 
around the city Aachen at the border to the Netherlands. The deviation from the model could thus 
be explained by edge effects hiding potential transmissions across the border. 

Altogether, we are led to the conclusion that the proposed model provides a useful description of 
the spread of IMD. It allows a quantification that the serogroup B hnetype has a higher epidemic 
potential than the serogroup C hnetype and shows age difference in spread behaviour. A sensitivity 
analysis conhrmed robustness of these results for increasing values of 6. Order and significance of 
the hnetype diherence in the epidemic component remained stable for e in the range of 1-5 weeks 
to 1-4 months. Age group results were slightly more varying: the 3-18 year olds remain having the 
highest epidemic potential, but from e > 35 days on, the oldest age group comes in second. The 
sensitivity analysis also showed, that there is too little information to estimate £ from the IMD 
data - we are thus forced to hx the hyperparameter at a biological plausible value. 


6. Discussion 

We presented a comprehensive framework for modelling, inference and simulation for infectious 
disease occurrence data. In the case of IMD, the infected individual is ehectively removed from the 
transmission network once the disease becomes manifest. Secondary cases are thought to acquire 
the infective strain either from the case during incubation or from asymptomatic carriers close to 
the case. Although marks attached to the case can naturally not account for the latter mode of 
















Figure 5. Observed incidence (per 100,000 inhabitants) during 2002-2008 for both hnetypes 
together. Triangles pointing up (down) indicate districts with a higher (lower) incidence than 
explained by 100 simulations from the model. 


transmission, they represent a valid proxy for the transmission network of the case when analysing 
surveillance data, which typically lack information regarding carriage. 

Despite use of disease surveillance data, we were able to quantify differences in IMD transmission 
dynamics based on age and hnetype. That the modelling requires an epidemic component is 
of epidemiologic interest in its own, as this shows that IMD incidence goes beyond sporadic 
occurrences. To our knowledge, our analysis is the hrst report of hnetype-specihc differences in 
spread tendencies. Contrary to previous analyses we were not able to hnd a signihcant connection 
between IMD and concurrent number of influenza cases. The spatial spread appeared to happen 
at a rather small scale - a scale which the usual district resolution data collected as part of the 
German Infection Protection Act does not allow to analyse. Thus our work is also a contribution 
to the controversy between patient privacy and the need for high-resolution data to gain new 
epidemiological insights. One important question in this debate is how good a proxy the patient’s 
residence is for his general whereabouts. 

Even though our GIF modelling is similar in form to the proposal in Hohle (2009), the continuous 
space of the IMD application makes epidemic modelling conceptually different. The classical SIR 
model framework does not apply in this situation, because events do not originate from a prede- 
hned population and individuals can not be partitioned into model compartments anymore. Thus, 
including population density becomes important and one needs to distinguish between covariate in¬ 
formation of events and covariates of the surrounding environment within which the process occurs. 
Furthermore, likelihood inference is complicated by requiring an additional integration over space 
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for complex polygons. Finally, the now proposed space-time interaction fnnctions are completely 
general in form and thus provide an advantage over the previous linear basis decomposition and 
resulting parameter constraints. 

An issue currently not dealt with in our estimation are edge effects, i.e. data are only available 


for Germany, but infections occur outside the observation window. For example, Elias et ah 


investigate the contribution of cross-border spread to increased incidence of IMD in the German 
region of Aachen neighbouring the Netherlands. A cross-border effect is indeed detected by our 
simulation in Figure where the Aachen region has higher observed incidences than can be 
explained by our model. Hence, the actual disease clusters are wider than observed in Germany, 
which potentially causes underestimation of the epidemic weight. Edge correction for inference in 
spatio-temporal point processes is, however, still an open methodological issue. 

An additional strength of the proposed modelling is that it offers a parametric framework for 
conducting prospective change-point analysis in spatio-temporal point processes typical in disease 
surveillance: Within the framework of stochastic process control one could e.g. use likelihood ratio 
detectors to monitor the time point where inclusion of an epidemic component is necessary to 
describe the observed data. This would correspond in idea to the time series setting investigated 


m 


Hohle and Paul (2008) or the homogeneous spatio-temporal Poisson process setting of Assungao 
and Gorrea ( 2009| ). 


The presented methods for inference and simulation of twinstim models are available as part of 


the R package surveillance (Hohle et al. 
Archive Network. 


2011 Hohle, 2007) available from the Gomprehensive R 


7. Supplementary Materials 

The Web Animation referenced in Section [T] and the Web Appendices referenced in Sections]^ and 
are available under the Paper Information link at the Biometrics website http: //www.biometrics. 
tibs.org/, 
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